Morphological Processing for English-Tamil Statistical Machine Translation

نویسندگان

  • Loganathan Ramasamy
  • Ondřej Bojar
  • Zdeněk Žabokrtský
چکیده

Various experiments from literature suggest that in statistical machine translation (SMT), applying either pre-processing or post-processing to morphologically rich languages leads to better translation quality. In this work, we focus on the English-Tamil language pair. We implement suffix-separation rules for both of the languages and evaluate the impact of this preprocessing on translation quality of the phrase-based as well as hierarchical model in terms of BLEU score and a small manual evaluation. The results confirm that our simple suffix-based morphological processing helps to obtain better translation performance. A by-product of our efforts is a new parallel corpus of 190k sentence pairs gathered from the web.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Transition of Phrase based to Factored based Translation for Tamil language in SMT Systems

Machine translation is one of the major and the most active areas of Natural language processing. Machine translation (MT) is an automatic translation of one natural language into another using computer generated instructions. The utility and power of Statistical Machine Translation (SMT) seems destined to change our technological society in profound and fundamental ways. The current state-of-t...

متن کامل

Improving the Performance of English-Tamil Statistical Machine Translation System using Source-Side Pre-Processing

Machine Translation is one of the major oldest and the most active research area in Natural Language Processing. Currently, Statistical Machine Translation (SMT) dominates the Machine Translation research. Statistical Machine Translation is an approach to Machine Translation which uses models to learn translation patterns directly from data, and generalize them to translate a new unseen text. T...

متن کامل

Quality Translation Using the Vauquois Triangle for English to Tamil

The aim of this work is handling complex sentences and alignments of words. Hybrid Machine Translation is automatically acquires knowledge from large amounts of training data at different languages. The system is to translate complex sentence structures to process able chunks and translating the text English to Tamil. The system is first separates the source text word by word with POS category ...

متن کامل

Tamil IT ! : Interactive Speech Translation in Tamil

The Tamil IT! (Interactive Translation) speech translation system is intended to allow unsophisticated users to communicate across the Tamil ↔ English language barrier, without strong domain restrictions, despite the error prone nature of current speech and translation technologies. Achieving this ambitious goal depends in large part on allowing the users to interactively correct recognition an...

متن کامل

Nlp Challenges for Machine Translation from English to Indian Languages

This Natural Langauge processing is carried particularly on English-Kannada/Telugu. Kannada is a language of India. The Kannada language has a classification of Dravidian, Southern, Tamil-Kannada, and Kannada. Regions Spoken: Kannada is also spoken in Karnataka, Andhra Pradesh, Tamil Nadu, and Maharashtra. Population: The total population of people who speak Kannada is 35,346,000, as of 1997. A...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013